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2. Errata Descriptions and Workarounds 


Erratum #1: 


A diagnostic read of a fully associative Translation Lookaside 
Buffer (TLB) translation table entry may return incorrect data. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 
This problem happens under following conditions. 


e Data TLBs: Any memory access instruction that misses the Data TLB and is 
followed by a diagnostic read access (Idxa from ASI DTLB DATA ACCESS REG, 
ASI<0x5d) from the fully associative TLBs and the target TTE has page size is set 
to 64 KB, 512 KB, or 4 MB. 


¢ Instruction TLBs: Any instruction that misses the Instruction TLB and is followed 
by a diagnostic read access (Idxa from ASI ITLB DATA ACCESS REG, 
ASI<0x55) from the fully associative TLBs and the target TTE has page size set to 
64 KB, 512 KB, or 4 MB. 

Impact: 


The data returned from the Translation Table Entry (TLB) will be incorrect. 


Workaround: 


This problem can be overcome by reading the fully associative TLB TTE twice, 
back to back. The first access may return incorrect data if the above conditions 
are met; however, the second access will return correct data. 

Status: 


This bug will not be fixed in future releases of the silicon. 
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Erratum #2: 


Erratum #3: 


An incorrect context number is placed in the Data Memory 
Management Unit (DMMU) Tag Access register after a 
data_access_exception trap. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


When a floating point load on either ALU pipe (A0/A1) takes a 
data_access_exception trap, the wrong context number may be saved in the 
context field of the Data Memory Management Unit (DMMU) Tag Access 
register (ASI 0x58, VA=0x30). 

Impact: 


The incorrect context number is provided to the data_access_exception trap 
handler. 
Workaround: 


Use the CT field (context ID) of the DMMU Synchronous Fault Status register 
(SFSR, ASI 0x58, VA=0x18). The data_access_exception trap handler should 
refer to either the primary, secondary, or nucleus context registers to convert 

the 2-bit context ID into a 13-bit context number. 

Status: 


This bug will not be fixed in future releases of the silicon. 


A read to a non-existent Jalapeno Bus (JBus) Agent ID can cause 
the CPU to hang if ESTATE.PERREN is disabled. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 
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Erratum #4: 


Description: 


If ESTATE.PERREN is programmed to 0, prefetch read requests and other 
read requests can interact in the JBus Unit (JBU) to cause a CPU hang 
condition. When ESTATE.PERREN is enabled, reads to non-existent JBus 
Agent IDs will normally result in an unmapped error in the Asynchronous Fault 
Status Register (AFSR.TO) and a deferred trap and later a hardware time-out 
(AFSR.JETO) that will reset the CPU. 


Impact: 
A CPU hangs when ESTATE.PERREN is 0. 


There should be no reason for software to disable fatal reset traps on JBus 
protocol errors by writing ESTATE.PERREN to 0. When these traps are 
disabled, errors which can corrupt data and system state (AFSR.JETO, 
AFSR.JEIC, AFSR.JEIS, AFSR.JEIT) will not cause a fatal reset and will result 
in unpredictable behavior. Software should enable this bit under all conditions. 


Workaround: 


Software should not disable ESTATE.PERREN. In addition, software should 
not do any probing with reads to invalid addresses that will result in a fatal 
reset. If address probing needs to be done, stores should be used. Stores will 
cause AFSR.TO to be set, but will not hang the CPU. 


The best alternative for software that is to find the existent memory space and 
JBus agents is to read the JBUS_CONFIG register, which contains the existing 
Agent IDs in bits [50:44]. For each bit that is set, that particular agent exists 
and cacheable space exists (e.g., if JBUS CONFIG[45]==1, cacheable space 
with PA[40:36]==0xb00001 exists). 


Status: 


This bug will not be fixed in future releases of the silicon. 


Diagnostic loads/stores to Address Space Identifier (ASI) 0x4e 
(ASI ECACHE TAG or ASI ECACHE FLUSH) during normal 
operation can break coherency or hang the CPU. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 
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Erratum #5: 


Description: 


The following operations can result in a CPU hang or coherency corruption if 
done during normal operation: 


e An attempt to invalidate an L2 tag through the use of ASI Ox4e. 
e Loads and stores to ASI 0x4e (L2 tag access or L2-cache flush) that interact with 


snoops, copybacks, other stores, and cache victim lines. 
Impact: 
CPU can hang or lose coherency when intermixing normal L2 tag behavior with 
ASI 0x4e loads and stores. 
Workaround: 


All L2 tag diagnostic accesses or flushes should be surrounded with Memory 
Barrier Instructions (MEMBARs) and run out of non-cacheable space to protect 
the software from running into this case. 

Status: 


This bug will not be fixed in future releases of the silicon. 


Clarification of the UltraSPARC-IIli User Manual description of the 
Prefetch Cache-related Data Cache Unit (DCU) Control Register 
bits. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


The Joint Programming Specification (JPS1) supplement text "PE=0 causes all 
prefetches to be handled as P-cache misses" is misleading. 


A more detailed description and behavior of the Pcache Data Cache Unit 
(DCU) control registers bits is as follows: 


e PE Prefetch Cache Enable (bit 45): If cleared, no hardware prefetches will be 
generated and all software prefetches will be treated as No Operation (NOP). No 
Floating Point (FP) load miss data (32B) will be installed from the L2-cache into the 
P-cache and all P-cache references for FP-loads are treated as a P-cache miss. 
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e HPE Prefetch Cache Hardware Prefetch Enable (bit 44): If cleared, the P-cache 
does not generate any hardware prefetch requests to the L2-cache and no FP-load 
miss data (32B) is installed from L2-cache to the P-cache. 


e SPE Software Prefetch Enable (bit 43): If cleared, software prefetch instructions 
do not generate a request to the pipeline,. They will be treated as NOPs. 


Usage: 


e For hardware prefetch generation, both HPE and PE should be turned ON. 
e For software prefetch generation, both SPE and PE should be turned ON. 


Status: 


This erratum updates the documentation. 


In a Multi-processor (MP) system, replacing a load with a store 
while it is executing can cause a hang 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 
There were two cases discovered which exhibit this hang problem. 


e The first is a load that misses the D-cache is recirculating and it is subsequently 
changed into a store. In this case, the Integer Execution Unit (IEU) Store Queue 
(STQ) counter hits an illegal state and the machine hangs. 

+ The second case is similar to the first, where the load is recirculating waiting for 
data from memory. If the load is changed into an ALUop, the IEU STQ counter is 
not hitting the illegal state, but if the next memory reference is a store in the 
instruction stream, the same problem can be seen. 


Impact: 


When a load that missed the Data Cache (D-cache) is recirculating and it is 
changed into a store, the system hangs. 
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Workaround: 


Do not write self-modifying code that changes a possibly-executing load into a 
store or changes that load into an ALUop such that the next memory reference 
will be a store. It’s best to leave the load alone until one is sure that it will not 
be executed. 


The generic workaround is for the system software to ensure that when 
memory is mapped by an Instruction Translation Look-Aside Buffer (iTLB) entry 
in a processor that no other processors have writable Data Translation Look- 
Aside Buffer (dTLB) entries for the same memory. There are complexity and 
performance effects with this approach, however (e.g., slowing down TLB 
misses and scalability due to more demaps). 

Status: 


This bug will not be fixed in future releases of the silicon. 


It is not legal to relax the bstore/bstore* and MEMBAR #Sync rule 
when Physical Address PA[12:5] match. Order is required. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


The Memory Barrier Instruction (MEMBAR) rules table (i.e., table 8-3 in the 
UltraSPARC-IIli Users Manual) shows that neither a bstore followed by bstore 
commit case nor a bstore followed by the bstore case require a MEMBAR 
#Sync when there is a Virtual Address VA[12:5] match. This is incorrect. The 
MEMBAR #Sync is required. 

Impact: 


The failure mode is “wrong data”. 


Status: 


This erratum updates the documentation. 
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Erratum #8: 


The UltraSPARC-Illi has only a one-deep queue for servicing 
noncacheable reads with RDERR returns. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


The UltraSPARC-IIli does not have any accessible noncacheable space and 
any noncacheable read directed at a CPU should be responded with an 
unmapped RDERR packet on the JBus. Noncacheable reads, however, are not 
set up with the same flow control as normal reads, and there is just a one-entry 
queue for noncacheable read responses. 


If there is a RDERR packet waiting to go out on the JBus and another 
noncacheable read is received, the ADTYPE of the first will be overwritten by 
the second, causing two RDERR return packets to be sent to the same device. 
If the sources of the two noncacheable reads are different devices, this may 
hang the system. 


If there are multiple noncacheable reads directed at a single CPU and the 
RDERR response to the first noncacheable read cannot be serviced due to the 
outgoing queue begin full, the second noncacheable read will overwrite the 
first. The first read will not get a response and will eventually time-out. The 
second will be responded to with an unmapped RDERR packet. In this case, a 


JBus read is dropped, and a hardware time-out is the most likely consequence. 


Impact: 

If multiple noncacheable reads are directed to a single CPU, a RDERR packet 
can be sent returned to the wrong device. It is also possible for a noncacheable 
read directed at the CPU to be dropped, causing a system time-out and a fatal 
reset if enabled. 

Workaround: 

None. Software must not probe devices using noncacheable reads, since this 
can causes a possible hang situation or a hardware time-out and system reset. 
Status: 


This bug will not be fixed in future releases of the silicon. 
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Erratum #9: 


Erratum #10: 


Diagnostic Address Space Identifier (ASI) writes to the L2 tag with 
bit VA[31]=1 will hang the CPU. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


The ASI register address for both L2-cache flushes and diagnostic L2 tag 
accesses is 0x4E. For flushes, VA[31] should be one for the ASI read and for 
diagnostic accesses, VA[31] should be zero for the ASI read/write. If software 
attempts a diagnostic L2 tag write with VA[31]=1, the L2-cache unit thinks that 
it is doing a flush operation and after the tag write, will wait for the fill data to 
come from the JBU, which never happens, thereby hanging the CPU. 


Impact: 
The system hangs during Address Space Identifier (ASI) writes the L2 tag with 
Virtual Address VA[31]=1. 


Workaround: 
Software must not write ASI 0x4E with VA[31]=1. 


Status: 


This bug will not be fixed in future releases of the silicon. 


The mixture of fdiv and fsqrt instructions with a pdist instruction 
cause an incorrect Floating Point (FP) register result. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


The sequence of code that generates the problem is unlikely by nature. There 
is a fdiv or fsqrt instruction that has a write-after-write (WAW) hazard with a 
subsequent FP or graphics operation. The second FP operation will overwrite 
the result of the fdiv or sqrt operation before it can be used, making the 
sequence unlikely. There is a second fdiv or sqrt instruction that is followed by 
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Erratum #11: 


a pdist instruction that has both a read-after-write (RAW) and WAW hazard 
with the rd of the pdist instruction. pdist is a special instruction that uses both 
rs1 and rs2 as source operands and an integer in the rd as a source operand, 
and then stores the results into rd. Also unlikely is a pdist that would normally 
use the result of an fdiv or sqrt instruction as its source operand. 


What happens is that the WAW hazard before the first fdiv or fsgrt instruction 
and the subsequent FP or GR operation is used incorrectly as a WAW to affect 
the state of the later, and second, fdiv or fsqrt instruction, making it unable to 
bypass its result to the following instructions (in this case, the pdist instruction). 
When the pdist instruction executes, the rd register that it wants to use in its 
computation appears to be available in the FP register file, instead of still being 
computed by the fdiv or fsgrt instruction. Thus, the pdist instruction executes at 
the same time as the fdiv or fsqrt instruction does (which is asynchronous to 
the pipeline) and uses the old rd from the FPRF incorrectly. 


Impact: 

The result after a Pixel Component Distance (pdist) instruction execution is 
either incorrect or the pdist result appears to be dropped completely. 
Workaround: 


Ensure that the failing code sequence does not exist in current software 
routines. Check that no code has a pdist instruction following an fdiv or fsqrt 
instruction. 

Status: 


This bug will not be fixed in future releases of the silicon. 


The memory controller corrupts read data in the presence of reads 
to out-of-range addresses while in Energy Star (E-star) mode. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, and 3.2. 


Description: 


When in Energy Star (E-star) mode and the software is generating out-of-range 
accesses to memory (as is the case during as L2 flush), data corruption results 
in a RDERR, internal processor error (IERR), and Timeout (TO). 
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Impact: 


When a CPU is in E-star mode, the memory controller can return RDERR 
responses to valid memory access in the presence of out-of-range accesses. 
Typically, the out-of-range memory accesses are generated by the operating 
system when flushing the L2-cache in the presence of ECC errors. During the 
flushing, other CPUs in the system can be accessing the memory of the CPU 
that is having the L2 flushed. 


Workaround: 


Do not to allow out-of-range accesses during E-star mode (i.e., do no allow L2- 
cache flushing). There is no bit to disable address range checking in the MCU. 


Status: 


This bug will not be fixed in future releases of the silicon. 


An illegal L2 install state is logged as an Asynchronous Fault 
Status Register System Interface Protocol Error (AFSR.JEIS) when 
flushing the L2-cache. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


When the Memory Controller Unit (MCU) returns a read error because the 
address from an L2-cache flush is out-of-range, the logic in the JBU considers 
this a fast time-out and does not wait for all J PACK snoops to return from the 
Jalapeno Bus (JBus). The fill is sent to the L2-cache along with a time-out 
signal and the ECU will install the line in the L2 in the invalid state. 


Because the JBU logic detects valid install state for fills to the L2, and the 
request for a flush is RDS, the install state is detected as illegal because invalid 
is normally an illegal install state for RDS. An illegal install state is flagged 
(AFSR.JEIS). Installing in invalid state should be expected for the L2 flush 
case. 
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Impact: 


In the presence of multiple reads outstanding in the JBU while doing L2-cache 
flushing, a system reset occurs (if enabled) and the Asynchronous Fault Status 
Register System Interface Protocol Error (AFSR.JEIS) is logged. 


Workaround: 


When doing an L2-cache flushing or any reads that access an out-of-range 
address in the system, the bit that enables fatal resets for the JEIS should be 
cleared (e.g., ESTATE.PERREN, ASI=0x4b, bit 21). After finishing, clear 
AFSR.JEIS and re-enable the above bit for fatal resets on JBus protocol errors. 


Status: 


This bug will not be fixed in future releases of the silicon. 


The Asynchronous Fault Status Register AFSR.J ARB WIN field 
can be incorrect in a 4-way system when Jalapeno Bus (JBus) 
errors are logged. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


The AFSR.J ARB WIN field is intended to log the JBus driver at the time an 
error is detected on the JBus. This logic works properly for JBus systems 
without the JBus repeater chip and in most cases with the repeater chip. 


A corner case exists: If an error is sent across the JBus repeater chip at the 
same time that a JBus arbitration grant is lost from the agent that is across the 
repeater to an agent on a local segment to the CPU that is logging the error, an 
incorrect driver ID will be logged. 


Impact: 


The Asynchronous Fault Status Register AFSR.J ARB WIN field shows a 
different value than what is expected when a Timeout (TO) or Bus Error From 
System Bus (BERR) error is logged when sent across the Jalapeno Bus (JBus) 
repeater chip. 
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Workaround: 


None. On systems that use the JBus repeater chip on the JBus (i.e., typically 
4-socket systems), the AFSR.J ARB WIN field cannot be guaranteed to be 
correct in all cases when JBus errors are detected. 

Status: 


This bug will not be fixed in future releases of the silicon. 


L2 Cache ECC errors can be caused by data corruption in the 
Memory Controller Unit (MCU). 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, and 3.2. 


Description: 


Under corner-case conditions, data can be mishandled by the Memory Data 
Buffer within the Memory Interface Unit (MCU). This mishandling results in data 
corruption for a local read. 


Impact: 


The L2-cache has data corrupted in one 16 byte word of a cacheline (usually 
the last 16 bytes). The corruption can be detected as either an L2-cache ECC 
error or not detected at all. The data and check bits will be completely 
corrupted. The data and check bits will appear as an AND of the correct data 
and another piece of data recently read from memory. 


Workaround: 


There is only one complete workaround: 


1. Use a memory ratio below 10:1, since it is not possible to see this problem for 
memory ratios 8:1 and 9:1. 


The following workarounds reduce the freguency of the bug, but do not 

eliminate the possibility of seeing it: 

2. Use an ASI MCU CTL REG2.AREN setting where AREN + (CASL * MCLK ratio) 
is an even number. 

3. Disable the Early Access feature (i.e., clear bit 29 of ASI MCU CTL REG1). 
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4. Setthe Memory Data Buffer size to 1 entry (i.e., clear bit 28 of 
ASI MCU CTL REG1). 


Workarounds 3 and 4 will work for any combination of ratios. Workaround 3 
has smaller performance impact. 


Status: 


This bug will not be fixed in future releases of the silicon. 


In a 4-way system, a live-lock case on the JBus causes a Memory 
Controller Unit Time Out (MCU TO), which results in an AFSR.IERR 
logging and a system reset. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, and 3.2. 


Description: 


With a specific kind of JBus traffic and CPU traffic in a 4-way system, a 
performance issue is created during internal arbitration in the JBus Transaction 
Issue Unit (TIU). The performance issue becomes a functional issue when the 
Memory Read Oueue times out after it is unable to issue a reguest for 8,000 
cycles. The end result is an internal error and a fatal reset if enabled. 


Impact: 


An AFSR.IERR is logged and the system resets when there is a stream of local 
block stores (causing INV transactions) and high JBus utilization with many 
outstanding reads. A cyclical pattern appears on the JBus, which prevents data 
from returning from memory. 


Workaround: 


The easiest way to overcome this case is to intersperse the local block stores 
(or writebacks) with reads that don't cause writebacks such that the string of 
INV transactions is interrupted, allowing read data to go out onto the bus from 
the Data Out Buffer (DOB). The following can break the live-lock from 
occurring: 


e Other agents are not requesting the bus constantly. 
e If AOK turns off a long time because of too many outstanding addresses. 
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Erratum #16: 


¢ If DOK turns off. 


e Any other traffic from the local CPU to Jalapeno Bus (JBus) other than INV 
transactions. 


e Any foreign request that hits in the L2-cache. 


Status: 


This bug will not be fixed in future releases of the silicon. 


On a local DRAM Uncorrectable Error (UE), the two LSBs of the 
check bits from memory are flipped when installed in the L2. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


On other UltraSPARC-III-based processors, the L2 is installed with the data 
and check bits that were read from DRAM on an uncorrectable error (UE). 


UltraSPARC-IIli creates a signalling UE case, where new ECC check bits are 
generated for the data and then the lower two bits of the check bits are flipped. 
If the line is flushed (i.e., if the data is installed in M state), this causes a UE 
when read from the L2, with a syndrome of 0x003. This notifies software that 
this data is a result of a previous UE somewhere else in the system. However, 
in this case new ECC check bits are not generated and UltraSPARC-lIIli flips 
the two LSBs that were read from DRAM. 


In the case of a DRAM UE, the data is normally installed in the L2 in the Invalid 
state, but can be installed in the Modified state in the case of a write. 
Impact: 


On a local DRAM uncorrectable error (AFSR.UE), the data and check bits 
installed in the L2 are not the data and check bits read from the DRAM. The 
two LSBs of the check bits are flipped. 


Workaround: 


None. Currently, software should panic on a UE and the kernel will scrub the 
L2 as well as the DRAM. This could become a problem if software is hardened 
to retry memory operations. 
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Erratum #17: 


Erratum #18: 


Status: 


This bug will not be fixed in future releases of the silicon. 


An SDRAM out-of-range write (AFSR.UMS) may corrupt the 
following SDRAM read. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


When an out-of-range physical address (PA) SDRAM write is followed 
immediately by an in-range PA DRAM read having the same DEVICE, BANK, 
and ROW as the write (except for a difference in the out-of-range bits 
PA[31:28]), the read returns wrong data. 

Impact: 

If a write to memory with an out-of-range PA occurs (causing AFSR.UMS to be 
set), a memory read immediately following the read may return corrupted data. 
Workaround: 


Software should not perform any writes to out-of-range memory locations. 


Status: 


This bug will not be fixed in future releases of the silicon. 
In privileged mode, a store alternate using Address Space Identifier 
(ASI) 0x64 hangs the processor. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 
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Erratum #19: 


Description: 


In privileged mode, a store alternate operation using Address Space Identifier 
(ASI) 0x64 should be flagged as usage of an illegal ASI and result in a 
data_access_exception trap. However, this checking is not present and the 
store is allowed to execute. The store, however, is never acknowledged since 
ASI 0x64 doesn't exist. 


Impact: 


The machine hangs in privileged mode. User mode behavior is correct, so a 
malevolent user cannot hang the machine. ASI 0x64 behavior is as follows: 














user mode privileged mode 
ASI LOAD privileged_action trap data_access_exception trap 
ASI STORE privileged_action trap CPU HANG 

















Workaround: 


Privileged code should not issue store alternates to ASI 0x64. 


Status: 


This bug will not be fixed in future releases of the silicon. 


A delay slot involving Delayed Control Transfer Instructions 
(DCTIS) may not be properly executed, and subsequent execution 
of certain instructions may result in skipping the original 
instruction stream and instead executing a different instruction 
stream. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 
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Description: 


One example of a failing code sequence is as follows: The dcti-couple is 
represented here as back-to-back branches: branch_1 and branch_2. The last 
instruction, the add in the delay slot of a branch, is dropped on the nth iteration 
through this code, but the number n is not significant: 


ldsb [%11], %06 
bes,pt $Sicc, .-0xC 
fmovs S£4, %£17 
nop 
fsubs $£17, %£5, %£17 
be,a,pt %xcc, .+0x4 
subcc %07, 0x2E4, %07 
fcmps sfcco, f17, %f6 
branch 1: bn,pn %icc, .+0x0 
branch 2: fbue .-0x18 
d slot 1: andn %07, 506, %07 
branch 3: brlz,a,pt so5, .-0x54 
d slot 2: movre $10, -0x0C3, %13 
branch_4: brlez,pt sil, .-0x138 
failure: add $00, 0x001, %00 // add does not execute 


The delay slot of a mispredicted Delayed Control Transfer Instruction (DCTI) 
may not be properly executed if the DCTI is last part of a dcti-couple or triple, 
or follows within several pipeline stages of a dcti-couple instruction pair. 
Symptoms of the failure may include: 


e The execution of a delay slot instruction from an older or a younger DCTI. 


e Non-execution of the real delay slot. 


¢ Skipping the instruction stream starting at a subsequent refetched instruction (e.g., 
a mispredicted or Jump and Link Instruction (JMPL) or return from subroutine 
(RET) delay slot, recirculating Load Integer Instruction (LD), following FLUSH or 
certain Write Privileged or Write State Register) or trapping instruction, and instead 
executing the instruction stream at PC=0x80 greater than the desired instruction. 


Spurious increment of the PC by 0x80 may also occur in the value saved by 
RDPC, TPC or TnPC on trap, or the return address of the CALL or JMPL. 


The following conditions are necessary for delay slot failure to occur: 


e The DCTI instruction is mispredicted. 
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e The Delay slot that will not be handled properly reaches |-stage before its DCTI 
reaches E-stage (delay slot must be in same cache line as DCTI or hit in icache). 

e There must be at least two older DCTI instructions in C-stage or earlier that are not 
JMPL/RET and not in delay slot of an earlier DCTI when mishandled delay slot 
instruction reaches I-stage. 

e There must be an older dcti-couple in C-stage or earlier when the mispredicted 
DCTI reaches l-stage. 

e Older unresolved DCTIs than the one with mishandled delay must all be predicted 
correctly. 


The erroneous PC increment of 0x80 additionally requires: 


e The mispredicted branch is actually not taken. 


e The PC[6] of DCTIs delay slot and PC[6] of the next sequential instruction are both 
0, and PC[6] of the falsely requeued delay slot is 1. 


Impact: 


A delay slot of a dcti-couple or a DCTI closely following a dcti-couple that 
reaches issue stage with older unresolved (C-stage or earlier) DCTIs may not 
be properly executed, and subsequent execution of either a refetched 
instruction or trapping instructions may result in skipping the original instruction 
stream and instead executing the instruction stream starting at PC=0x80 
greater than the original instruction’s PC. 


Workaround: 


Avoid DCTI couples, as per the V8 specification of unpredictable results from 
SPARC V8 manual: If the first instruction of a DCTI couple is a conditional 
branch, the targets of the DCTI are within the same address space as the DCTI 
couple, but are otherwise unpredictable. Given the relative rarity of dcti- 
couples, this problem is not viewed as particularly severe. 


Status: 


This bug will not be fixed in future releases of the silicon. 
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Erratum #20: A Read-after-Write address checking failure for load in the delay 
slot of a mispredicted Delayed Control Transfer Instruction (DCTI) 
can result in stale Data Cache (D Cache) data. 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


In one special circumstance, the read-after-write (RAW) checking fails for 
currently executing loads existing stores pending in the Store Queue, resulting 
in stale data being installed in the Data Cache (D Cache). From a program 
point of view, it will appear as if a store operation completed normally to 
External Cache (E Cache) and Memory, but did not update the D Cache. 


This situation can only manifest itself when there is a load that has a RAW 
hazard in the delay slot of a Delayed Control Transfer Instruction (DCTI) (e.g., 
conditional branch) that is mispredicted. There is a distinction in RAW checking 
where some loads are bypassable and some are not. A read-after-write load 
that is bypassable can obtain the data directly from the Store Queue. A read- 
after-write load that is not bypassable must wait until the youngest store (there 
can be more than one) that touches the D Cache line in question has left the 
Store Queue. Here is an example sequence and the conditions that need to be 
met for this bug to occur. 


STA << Hits in D Cache. 


ST B << Miss D Cache 


BR X << Must be mis-predicted 

LD C << Same 32 Byte line as B (Non-bypassable RAW) 

LDA << Line A invalidated in D Cache before this LD 
(Bypassable RAW) 





In this case of the branch being mispredicted taken, but actually taken, the 
execution pipe trace may look like this. When a branch is mispredicted, the 
delay slot is always cancelled and re-executed (unless it was annulled). 


BR X R E C M W X T OD < Mis-predicted 
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The conditions are as follows: 


(x) (t) (a) < Cancelled in C 
(w) (x) (t) (d) < Cancelled in C 
R E C M W X 


< Delay slot requed from MisPred Q 


e Branch is mispredicted and LD C and LD A were cancelled due to a misprediction 


in the 


C Stage. 


* LD A data is raw bypassable from the previous store; the store data has been 
retired into W Cache and is waiting to update D Cache; and the line A has been 
invalidated in D Cache. 


Impact: 


When the above conditions are met, the Store Queue fails to detect the RAW 
condition from ST B and fetches line C (same line as B) and installs it into the 
D Cache. The ST B in the Store Queue is not aware of this action since it was 
a D Cache miss when it was entered onto the Store Queue (and, if working 
properly, a store that misses the D Cache should never become a D Cache 
hit). When Store B exits the Store Queue, it only updates the data in W Cache, 
not the line in the D Cache. 


The branch is mispredicted and not-taken. 


Br 


0x5f16c554 
0x5f16c558 
0x5f16c55c 
0x5f16c560 


0x5f16c564 
0x5f16c568 
0x5fl6c56c 
0x5£16c570 
0x5f16c574 
0x5f16c578 
0x5f16c57c 
0x5f16c580 
0x5f16c584 
0x5f16c588 
0x5f16c590 
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ld 
sth 
st 
st 


sth 
st 
addcc 
stb 
addcc 
sth 
st 

be 
ldsb 
ld 
andcc 
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%il + Oxa74], %03 


$16, [%g2 + 0x3c] 

$13, [%g2 + 4] 

S£20, [%g2 + 0x14] 

<<< RAW bypasable Store (E). 

%05, [$i1 - 0x300] 

$£20, [%g2 + 0x2c] <<< RAW Hazard (D) 
$11, %10, %16 

$04, [%g2 + 0x27] <<< RAW Hazard (D) 
$14, %10, %ol 

$17, [%g2 + 0x26] <<< RAW Hazard (D) 
$£20, [%g2 + 0x3c] <<< RAW Hazard (D) 


Ox5£16c588 <<< MISPRED,NOT-TAKEN 


[$g2 + 0x37], %17 <<< RAW check fail 
[šg2 + 0x14], %f2l / 





$11, %14, %13 


24 








Erratum #21: 


Workaround: 


Turn off the RAW bypass enable (DCU.RE) bit in D-Cache Control Register 
(ASI<0x45, VA=0x0). This situation will not occur if RAW bypassing is disabled. 


Status: 


This bug will not be fixed in future releases of the silicon. 


In Energy Star (E-star) 1/2 mode or full speed mode, the Jalapeno 
Bus (JBus) IO (JIO) ASIC is held off of the bus by a stream of 
blocks stores between CPUs, resulting in a PCI Bus Error From 
System Bus (BERR). 


Applicability: 
UltraSPARC-IIli Versions 2.3 and 2.4. 


Description: 


In the failing condition, CPU1 is doing block stores to CPUO's memory, and 
because the system is in 1/2 E-star mode where the Memory Controller Unit 
(MCU) services one memory request at a time, the data FIFOs fill up, causing 
DOK_off to assert on JBus after every store from CPU1. CPU1 is able to get 
back on JBus once CPUO asserts DOK on faster than JIO can, and thus send 
another store to CPUO, causing DOK off to assert from CPUO. This live-lock 
situation continues, preventing JIO from arbitrating for JBus until the DMA 
request times out and causes a BERR. 


In Version 3.2 of UltraSPARC-IIli, a bit is added to the JBUS CONFIG register 
that when programmed will cause the JBus Unit Data In Buffer (DIB) to have a 
low-water mark of 2 entries (as opposed to 3). This enables two agents to drive 
data packets to the CPU after DOK_on is asserted. There may be a slight 
performance impact with this bit enabled for certain applications. 


The bit to enable for this feature is bit 60 of the JBUS_CONFIG register, 
named DOK_mode. Setting the bit to a 1 will enable the feature. 
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Erratum #22: 


Impact: 


In Energy Star (E-star) 1/2 mode, the Jbus IO ASIC (JIO) DMA does not make 
it to the Jalapeno Bus (JBus) for a very long time (i.e., greater than 5 ms), 
causing a PCI Bus Error From System Bus (BERR). This can also happen in 
full-speed mode with high SDRAM memory clock ratios. 

Workaround: 


None. 


Status: 


This bug will not be fixed in future releases of the silicon. 


In Energy Star (E-star) 1/2 mode, the Jalapeno Bus (JBus) IO ASIC 
(JIO) is held off of the JBus by a stream of software prefetches, 
causing PCI Bus Error From System Bus (BERR). 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, and 3.2. 


Description: 


In this failure, prefetch reads are saturating the DRAM interface. The Memory 
Controller Unit (MCU) normally gives priority to reads over writes, unless writes 
are sufficiently queued up. The high watermark of the memory write queue 
(MWO) is ignored, causing writes never to be drained. The DMA writes back 
up, and eventually cause the PCI bus error (BERR). 


Impact: 


In Energy Star (E-star) 1/2 mode, JIO DMA does not make it to Jalapeno Bus 
(JBus) for a very long time (i.e., greater than 5 ms), causing PCI timeout and 
Bus Error From System Bus (BERR). 


Workaround: 


Disabling software prefetches during E-star mode reduces the number of 
outstanding reads in the system and thus, avoids the read backup in the 
memory controller in E-star 1/2 mode. 
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Erratum #23: 


Status: 


This bug will not be fixed in future releases of the silicon. 


An Address Space Identifier (ASI) load returns corrupted data 
following an L2 ASI flush. 


Applicability: 
UltraSPARC-IIli Versions 2.3, 2.4, 3.2, and 3.4. 


Description: 


A flush reguest to ASI 0x4E may win arbitration more than once in the L2- 
cache controller due to an incorrect blocking condition in the arbiter. This 
occurs when a fill from the JBU lines up next to the ASI load or store in the L2- 
cache control. An immediately following load to 0x4E may return incorrect data. 


Impact: 

After an ASI 0x4E L2 flush, a following Address Space Identifier (ASI) load 
returns incorrect data. 

Workaround: 


Following L2 ASI flush routines, do not issue an ASI load immediately 
afterward. Alternatively, if an ASI load following the L2 flush routines is a 
reguirement, issue consecutive ASI loads and use the data from the second 
load. 


Status: 


This bug will not be fixed in future releases of the silicon. 
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